Genome-wide analysis points to roles for extracellular matrix 
remodeling, the visual cycle, and neuronal development in myopia 

Amy K. Kiefer 1 , Joyce Y. Tung 1 , Chuong B. Do 1 , David A. Hinds 1 , Joanna L. Mountain 1 , 

Uta Francke 1 , and Nicholas Eriksson 1 '* 

^andMe, Inc., Mountain View, CA, USA 
*nick@23andme . com 

September 11, 2012 



Abstract 

Myopia, or nearsightedness, is the most common eye 
disorder, resulting primarily from excess elongation 
of the eye. The etiology of myopia, although known 
to be complex, is poorly understood. Here we re- 
port the largest ever genome-wide association study 
(43,360 participants) on myopia in Europeans. We 
performed a survival analysis on age of myopia on- 
set and identified 19 significant associations (p < 
5- 10 -8 ), two of which are replications of earlier asso- 
ciations with refractive error. These 19 associations 
in total explain 2.7% of the variance in myopia age of 
onset, and point towards a number of different mech- 
anisms behind the development of myopia. One asso- 
ciation is in the gene PRSS56, which has previously 
been linked to abnormally small eyes; one is in a gene 
that forms part of the extracellular matrix (LAMA2); 
two are in or near genes involved in the regenera- 
tion of 1 1-cis-retinal (RGR and RDH5); two are near 
genes known to be involved in the growth and guid- 
ance of retinal ganglion cells (ZIC2, SFRP1); and five 
are in or near genes involved in neuronal signaling 
or development. These novel findings point towards 
multiple genetic factors involved in the development 
of myopia and suggest that complex interactions be- 
tween extracellular matrix remodeling, neuronal de- 
velopment, and visual signals from the retina may 
underlie the development of myopia in humans. 



Author Summary 

The genetic basis of myopia, or nearsightedness, is be- 
lieved to be complex and affected by multiple genes. 
Two genetic association studies have each identified a 
single genetic region associated with myopia in Euro- 
pean populations. Here we report the results of the 
largest ever genetic association study on myopia in 
over 40,000 people of European ancestry. We identi- 
fied 19 genetic regions significantly associated with 
myopia age of onset. Two are replications of the 
previously identified associations, and 17 are novel. 
Thirteen of the novel associations are in or near 
genes implicated in eye development, neuronal devel- 
opment and signaling, the visual cycle of the retina, 
and general morphology: DLG2, KCNMA1, KCNQ5, 
LAMA2, LRRC4C, PRSS56, RBFOX1, RDH5, RGR, 
SFRP1, TJP2, ZBTB38, and ZIC2. These findings 
point to numerous biological pathways involved in 
the development of myopia, and in particular, sug- 
gest that early eye and neuronal development may 
lead to the eventual development of myopia in hu- 
mans. 

Introduction 

Myopia, or nearsightedness, is the most common eye 
disorder worldwide. In the United States, an esti- 



mated 30-40% of the adult population has clinically 
relevant myopia (more severe than -1 diopter), and 
the prevalence has increased markedly in the last 30 
years [TJ[2] . Myopia is a refractive error that results 
primarily from increased axial length of the eye [3]. 
The increased physical length of the eye relative to 
optical length causes images to be focused in front of 
the retina, resulting in blurred vision. 

The etiology of myopia is multifactorial [3j . Briefly, 
postnatal eye growth is directed by visual stimuli that 
evoke a signaling cascade within the eye. This cas- 
cade is initiated in the retina and passes through the 
choroid to guide remodeling of the sclera (the white 
part of the eye) (cf. [4j[5] ) . Animal models impli- 
cate these visually-guided alterations of the scleral 
extracellular matrix in the eventual development of 
myopia. [4j[6]. 

In humans the eye typically grows about 5 mm 
from birth to age six, during which time vision typ- 
ically improves [7]. At age six only about 2% of 
children arc myopic [7]. Although the eye grows 
only 0.5 mm through puberty j8j, the incidence of 
myopia increases sevenfold [7j, peaking between the 
ages 9-14 [9] . Myopia developed during childhood or 
early adolescence generally worsens throughout ado- 
lescence and then stabilizes by age 20. Compared 
to myopia that develops in childhood or adolescence, 



studies (see 3p8 for reviews) and an exome sequenc- 



adult onset myopia tends to be less severe 10 ■ 12 



Although epidemiological studies have implicated 
numerous environmental factors in the development 
of myopia, most notably education, outdoor expo- 
sure, reading, and near work [3], it is well established 
that genetics plays a substantial role. Twin and sib- 
ling studies have provided heritability estimates that 
range from 50% to over 90% pp?]- Children of 



myopic parents tend to have longer eyes and are at 



increased risk of developing myopia in childhood 18 



Segregation analyses suggest that multiple genes are 
involved in the development of myopia 19 20 . To 



date, there have been seven genome- wide associa- 
tion studies (GWAS) on myopia or related pheno- 
types (pathological myopia, refractive index, and oc- 
ular axial length): two in Europeans 21 22 and five 
in Asian populations 23 - 27 . Each of these publica- 



tions has identified a different single association with 
myopia. In addition there have been several linkage 



ing study of severe myopia [29 

In contrast to the previous relatively small (up to 
approximately 5,000 initial cases) GWAS that used 
degree of refractive error as a quantitative depen- 
dent measure, we analyzed data for 43,360 individu- 
als from the 23andMe database who reported whether 
they had been diagnosed with nearsightedness, and 
if so, at what age. We performed a genome-wide sur- 
vival analysis on age of onset of myopia, discovering 
19 genome-wide significant associations with myopia 
age of onset, 17 of which are novel. 



Results and Discussion 

Participants reported via a web-based questionnaire 
whether they had been diagnosed with nearsighted- 
ness, and if so, at what age. All participants were 
customers of 23andMe and of primarily European 
ancestry; no pair was more closely related than at 
the level of first cousins. We performed a genome- 
wide survival analysis using a Cox proportional haz- 
ards model on 43,360 individuals ("discovery set"). 
This model assumes that there is an (unknown) base- 
line probability of developing myopia at every year 
of age. The model then tests whether each single 
nucleotide polymorphism (SNP) is associated with a 
significantly higher or lower probability of develop- 
ing myopia compared to baseline. The Cox model 
can be thought of as a generalization of an analysis 
of myopia age of onset. In contrast to an analysis of 
age of onset, the Cox model allows for the inclusion 
of non-myopic controls, resulting in considerably in- 
creased statistical power. Analyses controlled for sex 
and five principal components of genetic ancestry. An 
additional, non-overlapping set of 4,277 participants 
who answered a separate question about their use of 
corrective eyewear for nearsightedness before the age 
of ten were used as a replication set. See Table [T] for 
characteristics of the two cohorts. 

Table [2] shows the top SNPs for all 27 genetic re- 
gions associated with myopia with a p- value smaller 
than 10 -6 . All p- values from the GWAS have been 
corrected for the inflation factor of 1.14. A total of 
19 of the SNPs cross our threshold for genome- wide 
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Number 


% female 


Age (SE) 


Age of onset (SE) 


Discovery, myopic 
Discovery, not myopic 


26038 
17322 


46.1 
39.6 


48.6 (15.7) 
49.1 (17.1) 


16.4 (11.0) 


Replication, myopic at 10 
Replication, not myopic at 10 


800 
3477 


45.1 
45.2 


47.7 (14.9) 
50.0 (16.6) 


< 10 



Table 1: Cohort statistics. Sex, current age, and age of onset for discovery and replication cohorts. 



significance (5-10 8 , see Figure SI I. These 19 in 



elude two SNPs previously associated with refractive 
error in GWAS of European populations: rs524952 
near GJD2 and ACTC1 and rs4778882 near RAS- 
GRF1 [2ll|22 30 . p-values genome-wide are shown 
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in Figure 111 Figure S2 shows the quantile-quantile 
plot for the analysis. 

Of the 19 SNPs significant in the discovery set, 
nine were also significant in the replication set (Ta- 
ble [2]). As the replication set was small (barely a 
tenth the size of the discovery set) and measured age 
of onset less exactly, it is not surprising that not 
all SNPs replicated. We defined a genetic myopia 
propensity score as the number of copies of the risk 
alleles across all 19 SNPs identified via the discovery 
set. The propensity score showed a strong associa- 
tion with early onset myopia (less than 10 years old) 
in our replication cohort (p — 6.5 • 10 -12 , odds ra- 
tio 1.08 per risk allele). The top decile of genetic 
propensity had 2.54 greater odds of developing my- 
opia before the age of 10 than the bottom decile. In 
a Cox model fit to the discovery set, the propensity 
score explains 2.7% of the total variance. Note that 
this estimate may be inflated, as it is calculated on 
the discovery population. In this model, someone in 
the 90th percentile of risk (a score of 21.4) is nearly 
twice as likely to develop myopia by the age of 60 as 
someone in the 10th percentile of risk (score of 14.3), 
Figure [2] 

Of the 17 novel associations, many lie in or near 
genes with direct links to processes related to myopia 
development. Below, we briefly sketch out possible 
connections between these associations and extracel- 
lular matrix (ECM) remodeling, the visual cycle, eye 
and body growth, retinal neuron development, and 
general neuronal development or signaling. 




Figure 2: Estimated survival curves by genetic 
propensity score. The genetic propensity score is 
computed as the number of risk alleles across the 
19 genome-wide significant SNPs. Curves show es- 
timated survival probability (i.e., the probability of 
not having developed myopia) by age under the fitted 
Cox model for the 10th, 50th, and 90th percentiles of 
scores (14.3, 17.9, and 21.4, respectively). 
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Figure 1: Negative log 10 p- values genome wide for myopia. Regions are named with their postulated 
candidate gene or genes, p-values under 10~ 25 have been cutoff (only the LAMA2 region is affected). See 
Figure SI for plots in each region with a significant association. 



rsid 


chr 


Position 


Genes 


MAF 




allele 






HR (CI) 


v- 


value 


Prcpl 


rsl2193446 


6 


129820038 


LAMA2 





094 


0.991 


A/G 





798 


(0.773 - 


0.823) 


1 • 


10" 


42 


4.9 ■ 10 -4 


rsll602008 


11 


40149305 


LRRC4C 





160 


0.887 


A/T 


1 


149 


(1.121 - 


1.177) 


1.3 


10" 


-24 


0.012 


rsl7648524 


16 


7459683 


RBFOX1 





364 


0.974 


G/C 


1 


095 


(1.075 - 


1.114) 


3.5 


10" 


-20 


0.27 


rs3138142 


12 


56115585 


RDH5 





218 


0.817 


C/T 





892 


(0.872 - 


0.913) 


1.2 


10" 


-19 


0.0074 


chr8:60178580 


8 


60178580 


T OX/ CAS 





358 


0.970 


C/G 





917 


(0.900 - 


0.934) 


2.6 


10" 


-18 


0.26 


rs7744813 


6 


73643289 


KCNQ5 





405 


0.955 


A/C 





920 


(0.904 - 


0.937) 


1.7 


10" 


-17 


0.0016 


rs524952 


15 


35005886 


GOLGA8B/GJD2 





468 


0.980 


T/A 


1 


078 


(1.059 - 


1.097) 


3.3 


10" 


-15 


0.0019 


rs2137277 


8 


40734662 


SFRP1 





189 


0.922 


A/G 





908 


(0.887 - 


0.929) 


1.8 


10" 


-14 


0.52 


rsl550094 


2 


233385396 


PRSS56 





306 


0.963 


A/G 


1 


077 


(1.057 - 


1.098) 


4.9 


10" 


-13 


0.019 


rsl 1681 122 


2 


146786063 


PA BP CP 2 





425 


0.940 


A/G 





937 


(0.920 - 


0.954) 


3.6 


10" 


-11 


0.085 


rs7624084 


3 


141093285 


ZBTB38 





435 


0.961 


T/C 





942 


(0.925 - 


0.959) 


3.8 


10" 


-10 


0.19 


rsl898585 


2 


178660450 


PDE11A 





163 


0.942 


C/T 


1 


085 


(1.059 - 


1.111) 


4.9 


10" 


-10 


0.011 


rs2908972 


17 


11407259 


SHIS A 6 





397 


0.970 


T/A 


1 


060 


(1.042 - 


1.079) 


1.7 


• 10 


-9 


0.053 


rs6480859 


10 


79081948 


KCNMA1 





363 


0.987 


C/T 


1 


060 


(1.042 - 


1.079) 


2.0 


• 10 


-9 


0.82 


rsl0736767 


11 


84637065 


DLG2 





451 


0.996 


A/C 


1 


058 


(1.040 - 


1.077) 


2.2 


• 10 


-9 


0.53 


rslll45746 


9 


71834380 


TJP2 





198 


0.886 


G/A 


1 


076 


(1.052 - 


1.100) 


4.2 


• 10 


-9 


0.77 


rs4291789 


13 


100672921 


ZIC2 





326 


0.724 


C/G 


1 


070 


(1.048 - 


1.093) 


6 • 


10" 


9 


2.2 • 10" 4 


rs4778882 


15 


79382019 


RASGRF1 





399 


0.951 


A/G 


1 


059 


(1.040 - 


1.078) 


6.1 


• 10 


-9 


0.017 


rs745480 


10 


85986554 


RGR 





474 


0.975 


C/G 


1 


056 


(1.038 - 


1.075) 


8 • 


10 - 


9 


0.32 


rs5022942 


4 


81959966 


BMPS 


0.229 


0.991 


G/A 


1.063 


(1.041 - 


1.085) 


5.9 


• 10 


-a 


0.21 


rs9365619 


6 


164251746 


QKI 


0.457 


0.999 


C/A 


1.051 


(1.033 - 


1.069) 


1 • 


10" 


7 


0.097 


rsl031004 


4 


80516849 


ANTXR2 


0.261 


0.983 


T/A 


1.058 


(1.037 - 


1.079) 


1.5 


• 10 


-7 


0.62 


rsl7428076 


2 


172851936 


HAT1 /MAP ID 


0.236 


0.985 


C/G 


0.943 


(0.924 - 


0.963) 


1.6 


• 10 


-7 


0.18 


chrl4:54413001 


14 


54413001 


BMPJf 


0.489 


0.930 


G/C 


0.952 


(0.935 - 


0.969) 


4.6 


• 10 


-7 


0.38 


rs7272323 


20 


4756691 


PRND/RASSF2 


0.409 


0.955 


C/G 


1.050 


(1.031 - 


1.068) 


7 • 


10 - 


7 


0.015 


chrll:65348347 


11 


65348347 


EHBP1L1 


0.017 


0.558 


G/A 


0.781 


(0.711 - 


0.858) 


7.9 


• 10 


-7 


0.97 


rs55819392 


21 


40038207 


ERG/ETS2 


0.259 


0.987 


G/A 


0.949 


(0.930 - 


0.968) 


9.2 


• 10 


-7 


0.014 



Table 2: Index SNPs for regions with p < 1CT 6 . Index SNPs for regions with (A-corrected) p-values 
under 10" 6 . Positions and alleles are given relative to the positive strand of build 37 of the human genome; 
alleles are listed as major/minor. The listed genes are the postulated candidate gene in each region, r 2 is the 
estimated imputation accuracy; HR is the hazard ratio per copy of the minor allele; p-value is the p-value 
in the discovery cohort; p rcp \ is the p- value in the replication cohort. 
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Extracellular Matrix Remodeling 

The strongest association is a SNP in an intron 
of LAMA2 (laminin, alpha 2 subunit, rsl2193446, 
p = 1.0 • 1(T 42 , hazard ratio (HR) 0.80). Laminins 
are extracellular structural proteins that are integral 
parts of the ECM. Changes in the composition of the 
ECM of the sclera have been shown to alter the ax- 
ial length of the eye [5] . Laminins play a role in the 
development and maintenance of different eye struc- 
tures 31 32 . The laminin alpha 2 chain is found in 
the extraocular muscles during development [31] , and 



may act as an adhesive substrate and possibly a guid- 
ance cue for retinal ganglion cell growth cones [33] - 
We also found a suggestive association related to 
laminin (rsl031004, p = 1.5 • 10" 7 , HR=1.06) 312 
kb upstream of ANTXR2 (anthrax toxin receptor 2). 
ANTXR2 binds laminin and possibly collagen type 
IV [34| and thus may also be involved in extracellu- 
lar matrix remodeling. 

The Visual Cycle 

Two of the novel associations are in or near genes in- 
volved in the regeneration of 11-cis-retinal, the light 
sensitive component of photoreceptors; this regener- 
ation is commonly referred to as the visual cycle of 
the retina. These associations are with rs3138142, 
p = 1.2 • 10~ 19 , HR=0.89 in RDH5 (retinol dehydro- 
genase 5 (ll-cis/9-cis)) and rs745480 (p = 8.0 • 10~ 9 , 
HR=1.06), a SNP 18 kb upstream of RGR, which 
encodes the retinal G protein-coupled receptor. The 
SNP rs3138142 is a synonymous change in RDH5. 
It has been linked to RDH5 expression [35,36], and 
it is part of an Nr2f2 (nuclear receptor subfamily 2, 
group F, member 2) transcription factor binding mo- 
tif in mouse [37}[38]. Both RDH5 and RGR play cru- 
cial roles in the regeneration of 11-cis retinal in the 
retinal pigment epithelium (RPE) [39] . Mutations in 
RDH5 have been linked with fundus albipunctatus, a 
rare form of congenital stationary night blindness (for 
a recent review, see [40]) and progressive cone dys- 
trophy 1 41 , and mutations in RGR have been linked 
with autosomal recessive and autosomal dominant re- 
tinitis pigmentosa 



42 43 



gene that functions in the RPE: rs7744813 (p = 
1.7- 10~ 17 , HR=0.92), a SNP in KCNQ5 (potassium 
voltage-gated channel, KQT-like subfamily, member 
5). KCNQ5 encodes a potassium channel found in 
the RPE and neural retina. These channels are be- 



lieved to contribute to ion flow across the RPE 44045 



and to affect the function of cone and rod photore- 
ceptors [45] . 

Eye and Body Growth 

Three of our associations show possible links to eye 
or overall morphology. The first is a missense muta- 
tion in PRSS56 (A224T, rsl550094, p = 4.9 ■ 10~ 13 , 
HR=1.08). Other mutations in PRSS56 have been 
shown to cause strikingly small eyes with severe de- 
creases in axial length 46-48 . Two other associ- 



ated SNPs are in linkage disequilibrium with SNPs 
previously associated with height: rsl0113215 (p = 
2.6 • 10~ 18 , HR=0.92), near TOX and CA8 (thymus 
high mobility group box protein; carbonic anhydrase 
VIII), and rs7624084 (p = 3.8-10- 10 , HR=0.94), near 
ZBTB38 (zinc finger and BTB domain-containing 
protein 38). The SNPs rsl0113215 and rs7624084 are 
in linkage disequilibrium (LD) with rs6569648 and 
rs6763931, respectively (r 2 > 0.6 and r 2 > 0.8); both 



of which have been associated with height 49 50 . 



Retinal Ganglion Cell Projections 

Two of the novel associations are near genes that af- 
fect the outgrowth of retinal ganglion neurons during 
development. The first is rs4291789 (p = 6.0 ■ 10~ 9 , 
HR=1.07), which lies 34 kb downstream of ZIC2 (Zic 
family member 2). ZIC2 regulates two independent 
parts of ipsilateral retinal ganglion cell development: 



axon repulsion at the optic chiasm midline 51 52 



We also identified an association within another 



and organization of the axonal projections at their 
final targets in the brain [53] . 

The second, rs2137277 {p = 1.8 • 10~ 14 , HR=0.91), 
is a variant in ZMAT4 (zinc finger, matrin-type 4). 
ZMAT4 has no known link to vision, but this vari- 
ant also lies 385 kb downstream of SFRP1 (secreted 
frizzlcd-related protein 1). SFRP1 is involved in 
the differentiation of the optic cup from the neural 
retina 54 , retinal neurogenesis [55] , the development 
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and function of photoreceptor cells 56 57 , and the 



growth of retinal ganglion cells 58 



Neuronal Signaling and Development 

Finally, we found five associations with SNPs in 
genes involved in neuronal development and signal- 
ing, but without a known role in vision develop- 
ment or the vision cycle: in KCNMAl (potassium 
large conductance calcium-activated channel, sub- 
family M, alpha member 1; rs6480859, p = 2.0 • 1(T 9 , 
HR=1.06); in RBFOX1 (RNA binding protein, fox- 
1 homolog; rsl7648524, p = 3.5 ■ 10~ 20 , HR=1.10); 
in LRRC^C, leucine rich repeating region contain- 
ing 4C, also known as NGL-1 (rsll602008, p = 
1.3- 10" 24 , HR=1.15); in DLG2 (discs, large homolog 
2; rsl0736767,p = 2.2-10" 9 , HR=1.06); and in TJP2 
(tight junction protein 2; rslll45746, p = 4.2 • 1(T 9 , 
HR=1.08). 

KCNMAl encodes the pore- forming alpha subunit 
of a MaxiK channel, a family of large conductance, 
voltage and calcium-sensitive potassium channels in- 
volved in the control of smooth muscle and neuronal 
excitation. RBFOX1 belongs to a family of RNA 
binding proteins that regulates the alternative splic- 
ing of several neuronal transcripts implicated in neu- 
ronal development and maturation 



59 . LRRC4C 



encodes a binding partner for netrin Gl and promotes 
the outgrowth of thalamocortical axons 60 . DLG2 



plays a critical role in the formation and regulation of 
protein scaffolding at postsynaptic sites [6l]. TJP2 
has been linked with hearing loss: its duplication and 
subsequent overexpression are found in adult-onset 



progressive nonsyndromic hearing loss 62 . 



Conclusion 

This study represents the largest GWAS on myopia 
in Europeans to date. This cohort of 43,360 indi- 
viduals led to the discovery of 17 novel associations 
as well as replication of the two previously reported 
associations in Europeans. In contrast to the earlier 
studies that used refractive error as a quantitative 
outcome, we used a Cox proportional hazards model 
with age of onset of myopia as our major endpoint. 
This model yielded greater statistical power than a 



simple case-control study of myopia. Of the 19 signif- 
icant SNPs found using this model, all but one had 
smaller p- values when a hazards model was employed, 
and only 13 would be genome- wide significant using a 
case-control analysis on the same dataset (Table [Si]). 

The proportional hazards model assumes that each 
SNP has a constant effect on the hazard of develop- 
ing myopia at any age. When we tested the validity 
of this assumption for the 19 significant SNPs, five 
showed evidence of different effects at different ages 
(Table S2). While this violation should not lead to 
overly small p- values for these SNPs in the GWAS, 
it does make risk prediction based on these results 
less straightforward. These age dependent hazards 
suggest that different biological processes may affect 
the development of myopia at different ages. For ex- 
ample, rsl2 193446 in LAMA2 shows a large effect 
on myopia hazard at an early age, peaking around 
11 years, and then a null or even negative effect on 
hazard after the age of 30; other SNPs show different 



patterns of effect as a function of age (Figure S3 ) . 

Our findings further suggest that there may be 
somewhat different genetic factors underlying myopia 
age of onset and refractive error. Because adult on- 
set myopia tends to be less severe than myopia de- 
veloped in childhood or adolescence 



10 12 



age of 

onset is likely correlated with refractive error, but 
it is not known how strongly. Many of our associ- 
ations showed a stronger signal than the two asso- 
ciations near GJD2 and near RASGRF1 previously 
linked with refractive error in Europeans. Notably, 
the latter association, near RASGRF1, also failed to 
replicate in a recent meta-analysis 30 . The fact that 



many of our associations with strong effects on age 
of onset have not shown up in previous refractive er- 
ror GWAS implies that some genetic factors may af- 
fect the age of onset independent of eventual severity, 
and that the strength of different genetic associations 
with myopia may depend on the specific phenotype 
under study. 

We also note that our phenotype was based on 
participants' reports rather than clinical assessments. 
Although in theory errors in recall could have affected 
our results, we expect that the vast majority of peo- 
ple are able to recall when they first wore glasses with 
at most a few years of error. 
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The five associations previously reported in patho- Phenotype data 
logical myopia or refractive error GWAS in Asian 



populations 23 - 27 show no overlap with the signifi- 



cant or suggestive regions found here. Nor did we find 
an association with the ZNF644 locus that was iden- 
tified as the site of high-penetrance, autosomal dom- 
inant mutations in Han Chinese families with appar- 



ent monogenic inheritance of high-grade myopia 29 



This lack of overlap is further evidence that the ge- 
netic factors involved in myopia differ across popula- 
tions. 

Our identification of 17 novel genetic associations 
suggests several novel genetic pathways in the de- 
velopment of human myopia. These findings aug- 
ment existing research on the development of myopia, 
which to date has been studied primarily in animal 
models of artificially induced myopia. Some of the as- 
sociations are consistent with the current view, based 
largely on animal models, that a visually-triggered 
signaling cascade from the retina ultimately guides 
the scleral remodeling that leads to eye growth, and 
that the RPE plays a key role in this process [I]. 
A number of the novel associations point to the po- 
tential importance of early neuronal development in 
the eventual development of myopia, particularly the 
growth and topographical organization of retinal gan- 
glion cells. These associations suggest that early neu- 
ronal development may also contribute to future re- 
fractive errors. We expect that these findings will 
drive new research into the complex etiology of my- 
opia. 



Methods 
Human Subjects 

All participants were drawn from the customer base 
of 23andMe, Inc., a consumer genetics company. This 



cohort has been described in detail previously 63 , 64 



Participants provided informed consent and partici- 
pated in the research online, under a protocol ap- 
proved by the external AAHRPP-accredited IRB, 
Ethical & Independent Review Services (E&I Re- 
view) . 



Participants in the discovery cohort were asked on- 
line as part of a medical history questionnaire: "Have 
you ever been diagnosed by a doctor with any of the 
following vision conditions?: Nearsightedness (near 
objects are clear, far objects are blurry) (Yes/No/I 
don't know)". If they answered "yes", they were 
asked, "At what age were you first diagnosed with 
nearsightedness (near objects are clear, far objects 
are blurry)? Your best guess is fine." Those report- 
ing an age of onset outside of the range 0-100 were 
removed from analysis. All participants also reported 
current age. 

The replication cohort consisted of 23andMe cus- 
tomers who were not part of the discovery cohort 
(i.e., they didn't provide a nearsightedness diagno- 
sis). They answered a single question "Did you wear 
glasses or other corrective eyewear for nearsighted- 
ness before the age of 10? (Yes/No/I'm not sure)". 

Genotyping and imputation 

Participants were genotyped and additional SNP 
genotypes were imputed against the August 2010 re- 
lease of the 1000 genomes data as described previ- 
ously 65 . Briefly, they were genotyped on at least 



one of three genotyping platforms, two based on 
the Illumina HumanHap550+ BcadChip, the third 
based on the Illumina Human OmniExpress+ Bead- 
Chip. The platforms included assays for 586,916, 
584,942, and 1,008,948 SNPs respectively. Geno- 
types for a total of 11,914,767 SNPs were imputed 
in batches of roughly 10,000 individuals, grouped by 
genotyping platform. Of these, 7,087,609 met our 
criteria of 0.005 minor allele frequency, average r 2 
across batches of at least 0.5, and minimum r 2 across 
batches of at least 0.3. (The minimum r 2 requirement 
was added to filter out SNPs that imputed poorly in 
the batches consisting of the less dense platform.) 

Statistical analysis 

In order to minimize population substructure while 
maximizing statistical power, the study was limited 
to individuals with European ancestry. Ancestry was 
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inferred from the genome-wide genotype data, and 



principal component analysis was performed as in 63 



66 . No two participants shared more than 700 cM of 
DNA identical by descent (IBD, approximately the 
lower end of sharing between a pair of first cousins). 
IBD was calculated using the methods described in 
[67]. 

For the survival analysis, let the hazard function 
h(t) be the rate of developing myopia at time t. Then 
the Cox proportional hazards model is 

5 

log h(t) =a{t) + f3 g G + f3 s S + J2 Pp* PCi 

i=i 

for an arbitrary baseline hazard function a(t) and 
covariates G (genotype), S (sex) and PC\, . . . ,PC$ 
(projections onto principal components). G was 
coded as a dosage from 0-2 as the estimated num- 
ber of minor alleles present. 

For each SNP, we fit a Cox proportional hazards 
model using R 68 and computed a p-value using 
a likelihood ratio test for the genotype term. All 
SNPs with p- values under 5 • 10 -8 after genomic con- 
trol correction were considered genome-wide signif- 
icant. The hazard ratio (HR) reported throughout 
can be interpreted as the multiplicative change in the 
rate of onset of myopia per copy of the minor allele 
(e.g., e^ 9 ). To test the proportional hazards assump- 
tion, we tested for independence of the scaled Schoen- 
feld residuals for each significant SNP and time us- 



ing cox.zph (Table S2|. Replication p- values in Ta- 



ble [2] are one-sided p- values for a logistic regression 
model controlling for age, sex, and five principal com- 
ponents. 

For Figure [2j we computed a myopia propensity 
score for each individual as the (estimated) number 
of risk alleles among the 19 genome-wide significant 
SNPs. We then fit a Cox model including that score, 
sex, and five principal components. To estimate pro- 
portion variance explained for this model, we used 
a pseudo-r 2 using likelihoods (similar to the Nagelk- 
erke pseudo r 2 for logistic regression). That is, we 
calculated the variance explained as 



nLo 



where Lq is the null likelihood and L the likelihood 
for the full model. This is one of several methods used 
to compute variance explained for Cox proportional 
hazards models 



69 
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Figure SI: Region plots for genome- wide significant associations Colors depict the squared correlation 
(r 2 ) of each SNP with the most associated SNP (shown in purple). Gray indicates SNPs for which r 2 
information was missing. 



15 




Position on chr8 (Mb) 



(e) TOX/CA8 




73.4 73.5 



Position on chr6 (Mb) 



(f) KCNQ5 




*- GOLGABA *- GQLGA8B 



Position on cnr15 (Mb) 



(g) GOLGA8B/GJD2 




Position on chr8 (Mb) 



(h) SFRP1 



Figure SI: Region plots for genome- wide significant associations Colors depict the squared correlation 
(r 2 ) of each SNP with the most associated SNP (shown in purple). Gray indicates SNPs for which r 2 
information was missing. 
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Figure SI: Region plots for genome- wide significant associations Colors depict the squared correlation 
(r 2 ) of each SNP with the most associated SNP (shown in purple). Gray indicates SNPs for which r 2 
information was missing. 



17 




Position on chr17 (Mb) 



(m) SHISA6 




Position on chr10 (Mb) 



(n) KCNMA1 




Position on chM 1 (Mb) 



(o) DLG2 




Position on cht9 (Mb) 



(p) TJP2 



Figure SI: Region plots for genome- wide significant associations Colors depict the squared correlation 
(r 2 ) of each SNP with the most associated SNP (shown in purple). Gray indicates SNPs for which r 2 
information was missing. 
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Figure S2: Quantile- quant ile plot for myopia survival analysis Actual (A-corrected) p-values versus 
the null. 
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(c) rs3138142, p = 0.97 (d) chr8:60178580, p = 2.4 • 1(T 4 

Figure S3: Smoothed log-hazard ratios as a function of age for four representative SNPs In each 
plot, the straight line shows the estimated log-hazard ratio (beta) for each SNP in the proportional hazards 
model. The solid curve is a spline fit to beta estimated at different ages; the dotted curves are approximate 
95% confidence intervals. The p-value in each caption is the result of a test of the proportional hazards 
assumption. The sign of all coefficients has been made positive for ease of comparison (so (a), (c), and (d) 
are flipped relative to the main text). Note that angmg the examples here, only (c) shows no evidence of 
deviation from the proportional hazards assumption. 



Table SI: p-values for survival and case-control analyses 
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p-values for SNPs in the survival analysis used in the paper as well as in a case-control logistic regression 
on the same set of individuals. The survival analysis gives a smaller p-value for all but 1 SNP (rs3138142) 
and has 19 genome-wide significant (p < 5 • 10~ 8 ) as compared to 13 for the case-control, p-values in both 
cases are adjusted for the genomic control inflation factor of 1.14. 
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Table S2: Tests of deviation from the proportional hazards assumption 
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p-values for significant SNPs for deviation from the proportional hazards assumption in the Cox model. 
For each SNP, we fit a Cox proportional hazards model including the SNP, sex, and five principal 
components as predictors, and then tested for independence of the scaled Schoenfeld residuals with time. 
Bold SNPs deviate significantly from this assumption after correction for 19 tests. Plots for four example 
SNPs are shown in Figure |S3[ 
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